Do cars with big engines use more fuel than cars with small engines?
Skipping to the end
How did we do this?
library(ggplot2)ggplot(data = mpg) +geom_point(mapping =aes(x = displ, y = hwy, color = class)) +theme(legend.position ="bottom",panel.grid =element_blank(),panel.background =element_blank(),plot.title.position ="plot",plot.title =element_text(face ="bold") ) +labs(title ="Relationship between engine displacement and highway miles per gallon by class",x ="Engine displacement (L)",y ="Highway miles per gallon",color ="Class" )
Load relevant packages and data
# Load the relevant packageslibrary(tidyverse)# Load the datampg
manufacturer
model
displ
year
cyl
audi
a4
1.8
1999
4
audi
a4
1.8
1999
4
audi
a4
2.0
2008
4
audi
a4
2.0
2008
4
audi
a4
2.8
1999
6
audi
a4
2.8
1999
6
EXERCISE
Learn more about this data set by typing ?mpg into your console.
Plot your data
library(ggplot2)ggplot(data = mpg) +geom_point(mapping =aes(x = displ, y = hwy))
EXERCISE
What do you see when you run the following?
ggplot(data = mpg)
How many rows are in mpg? How many columns?
nrow(mpg)ncol(mpg)
What does the drv variable describe?
?mpg
EXERCISE
Make a scatterplot of hwy vs cyl.
ggplot(data = mpg) +geom_point(mapping =aes(x = hwy, y = cyl))
What happens if you make a scatterplot of class vs drv? Why is the plot not useful?
ggplot(data = mpg) +geom_point(mapping =aes(x = class, y = drv))
Let’s look at groups in the data
We are not restricted to looking at only two interesting elements of our data.
You can use visual elements or aesthetics (aes) to communicate many dimensions in your data.
Let’s look at a categorical variable: the class of car (SUV, 2 seater, pick up truck, etc.).
Look for meaningfully defined groups.
Let’s look at groups in the data
ggplot(data = mpg) +geom_point(mapping =aes(x = displ, y = hwy, color = class))
You can use visual elements to communicate your findings in engaging ways.
ggplot(data = mpg) +geom_point(mapping =aes(x = displ, y = hwy, color = class =="2seater"))
Changing the look of your plots
ggplot(data = mpg) +geom_point(mapping =aes(x = displ, y = hwy), color ="red")
EXERCISE
What’s gone wrong with this code? Why are the points not blue?
ggplot(data = mpg) +geom_point(mapping =aes(x = displ, y = hwy, color ="blue"))
EXERCISE
Which variables in mpg are categorical? Which variables are continuous?
Map a continuous variable to color, size, and shape. How do these aesthetics behave differently for categorical vs. continuous variables?
What happens if you map the same variable to multiple aesthetics?
Let’s add useful headings
ggplot(data = mpg) +geom_point(mapping =aes(x = displ, y = hwy), color ="red") +labs(title ="Relationship between engine displacement and highway miles per gallon",x ="Engine displacement (L)",y ="Highway miles per gallon" )
Let’s clean this up
Less is more when it comes to data visualization.
ggplot(data = mpg) +geom_point(mapping =aes(x = displ, y = hwy), color ="blue") +theme_minimal() +labs(title ="Relationship between engine displacement and highway miles per gallon",x ="Engine displacement (L)",y ="Highway miles per gallon" )
Let’s clean this up
Creating your own theme
ggplot(data = mpg) +geom_point(mapping =aes(x = displ, y = hwy, color = class)) +theme(legend.position ="bottom",panel.grid =element_blank(),panel.background =element_blank(),plot.title.position ="plot",plot.title =element_text(face ="bold") ) +labs(title ="Relationship between engine displacement and highway miles per gallon by class",x ="Engine displacement (L)",y ="Highway miles per gallon",color ="Class" )
Creating your own theme
EXERCISE
Create a scatterplot of hwy vs displ and a categorical variable in the mpg data set.